Assessment of filter effects

# set some standard parameter for the documents. 
knitr::opts_chunk$set(echo = TRUE, message=FALSE, warning=FALSE)

We can set standard options for our rmd documents using the knitr::opts_chunk$set function. In this case echo = TRUE means the code will be shown, message=FALSE means that internal message from functions will not print to the document, and warning = FALSE means that warning message will not print to the document. These setting will apply to all code block unless you specify differently in a block. More details can be found at the rmd cheat sheet.

# use pacman to load require packages 
if (!require("pacman")) install.packages("pacman") ## important because we will be calling this script mulitple times 
pacman::p_load(dplyr,raster,tmap,plotly)
# set number of sigfigs 
options(scipen=999)
# set tmap to interactive viewing 
tmap::tmap_mode("view")

Load in package and set some specific parameters. We’re using the library pacman to reduce the number of lines need to load in multiple libraries. This is a helpful tool to know about.

# input features
baseDir <- "D:/R_SC_Spatial/intermediateGeospatialR/"
county <- "Bexar"
months <- c("april", "may", "june", "july")
filters <- c(2,6,10)

We will come back to this code block through the lesson. Think about these objects as the input parameters to a function. We need to know where to look for data, what county we are interested in, what months we are evaluating, as well as what filter levels we want to apply to the data. All these parameters were pulled from the workflow we put together in day2_workflow.rmd.

### grab imagery for county 
images <- list.files(paste0(baseDir,"/data/nightLights/", county), pattern = ".tif", full.names = TRUE)
counts <- list.files(paste0(baseDir,"/data/nightLights/"), pattern = "_counts.tif", full.names = TRUE)

In the previous example we just pulled a single radiance and counts image and ran something like what’s listed below

# create a dataframe to store information 

# loop over all filter option 
for(i in filter){
  # create a mask based on the counts raster and filter 
  # apply that mask to the radiance raster 
  # detemine all vals excluding NAs 
  # calculate mean, median and percent area and store in data frame
}

We still need to do all this but nowwe have another level of complexity. We need to apply this process to all months not just one. We will frame it out below

for(m in months){
  # call in radiance and counts imagery base on month 
  # clip and mask the counts imagery based on radiance feature 
  # create a dataframe to store information 
  # loop over all filter option 
  for(i in filter){
    # create a mask based on the counts raster and filter 
    # apply that mask to the radiance raster 
    # detemine all vals excluding NAs 
    # calculate mean, median and percent area and store in data frame
  }
  # Store information from dataframe in comprehesive dataframe. 
}

The indexing and geoprocessing within the loop of the months is nothing new. It is worth noting that we are creating a new dataframe each month rather then outside of the initail loop. This is not the only way to do it, but i find it to be the easiest way. This is hard to visualize at the moment but I’ll just say that if we moved the dataframe initialization outside the first loop we would need to change how we store the data we generate.

We’ll fill out our outline with code.

# loop over months 
for(i in seq_along(months)){
  # select rasters using character match 
  m <- months[i]
  # grab the raster base on match in the file name 
  r1 <- raster::raster(images[grepl(pattern = m, x = images)])
  # create a mask object of the radience feature
  mask <- r1 
  mask[mask > 0] <- 1
  mask[mask != 1] <- NA
  # determine the total number of cells of interest by sum all values.
  totalCells <- sum(values(mask), na.rm = TRUE)
  # pull the correct counts feature base on character match and apply mask
  count1 <- raster::raster(counts[grepl(pattern = m, x = counts)])*mask 
  
  # create df to store results 
  df1 <- data.frame(matrix(nrow = length(filters), ncol = 5))
  colnames(df1) <- c("month","filter","mean","median", "totalArea") 
  df1$month <- m
  df1$filter <- filters
    ## loop over all seq 
    for(j in seq_along(filters)){
      # generate a mask with the counts image based on the seq value 
      c2 <- count1 
      # replace all values based on filter val
      c2[c2 < filters[j]] <- NA 
      # generate a mask base on new filtered data 
      c2[!is.na(c2)]<- 1 
      # apply that mask to radaince value 
      r2 <- r1 * c2 
      # calculate Mean, median of masked radiance raster 
      vals <- raster::values(r2)
      # drop all na values 
      vals <- vals[!is.na(vals)]
      # calculate values and assign features to dataframe
      df1[j,"mean"] <- mean(vals)
      df1[j,"median"] <- median(vals)
      # count total obervation in mask. 
      df1[j,"totalArea"] <- 100*(length(vals)/totalCells) 
    }
  # create a new dataframe object on first pass then add directly to that df on 
  # subsequent passes 
  if(i == 1){
    df <- df1
  }else{
    df <- dplyr::bind_rows(df, df1)
  }
}
df
##    month filter      mean   median   totalArea
## 1  april      2 11.679998 4.518006 100.0000000
## 2  april      6 11.341239 4.686239  68.8123300
## 3  april     10 19.168742 3.585171   0.4533092
## 4    may      2 11.103256 4.410460  97.7334542
## 5    may      6  9.938754 4.010025   3.8077969
## 6    may     10       NaN       NA   0.0000000
## 7   june      2 12.412578 4.768570 100.0000000
## 8   june      6 12.421702 4.846515  99.9093382
## 9   june     10 14.806388 6.737298  38.5312783
## 10  july      2 11.740087 4.596181 100.0000000
## 11  july      6 11.740087 4.596181 100.0000000
## 12  july     10  9.851227 3.099392  76.4279238

We have all our data for each month in a single dataframe and this alone shows some very telling tends. May seems to be the most effected by cloud cover where only four percent of the county area had six or more cloud free days. Where as July is a much more sunny time of the year.

Much like before gleaming these conclusion from the table is possible but visualized well they will jump off the page.

Relationship between number of observations and county level statistics

If we try to simply plug in the data we have to our existing plot workflow we get something strange

p1 <- plot_ly() %>%
  add_trace(x=df$filter, y=df$mean,type = 'scatter',  line = list(dash = 'dash', shape= "spline"))%>%
  layout(xaxis = list(title = "Filter Level "),
            yaxis = list(title = "Mean"))
p1

The function does not know that each month is a unique feature so it is plotting the relationship observed over multiple months on on continuous line. It looks cool, but it’s not informative.

To fix this we need to bring the month data into the process. The can but done easily using a built parameter.

p1 <- plot_ly()%>%
  add_trace(x=df$filter, y=df$mean,type = 'scatter', color = df$month, line = list(dash = 'dash', shape= "spline"))%>%
  layout(xaxis = list(title = "Filter Level "),
            yaxis = list(title = "Mean"))
p1

This looks great. It’s clear and shows the variability across the months. We can add the mean and percent area to get the full picture.

### generate the three specific plots 
# mean
p1 <- plot_ly() %>%
  add_trace(x=df$filter, y=df$mean,type = 'scatter', color = df$month, line = list(dash = 'dash', shape= "spline"))%>%
  layout(xaxis = list(title = "Filter Level"),
            yaxis = list(title = "Mean"))
# median 
p2 <- plot_ly() %>%
    add_trace(x=df$filter, y=df$median,type = 'scatter', color = df$month, line = list(dash = 'dash', shape= "spline"),     showlegend = FALSE)%>%
  layout(xaxis = list(title = "Filter Level"),
         yaxis = list(title = "Median"))
# percent area
p3 <- plot_ly() %>%
  add_trace(x=df$filter, y=df$totalArea,type = 'scatter', color = df$month, line = list(dash = 'dashdot', shape= "spline"),  showlegend = FALSE) %>%
  layout(xaxis = list(title = "Filter Level"),
         yaxis = list(title = "Percent Area"))
### create the subplot
p<- plotly::subplot(p1,p2,p3, nrows = 3, shareX = TRUE, titleY = TRUE)
p 

Plot 1 : The trend of county level mean when filtered based on the number of observations at a location

Plot 2 : The trend of county level median when filtered based on the number of observations at a location

Plot 3 : The percent of the counties total area present when filtered based on the number of observations at a location

Connecting RMD back to R script

As we mentioned at the beginning. This RMD can be called like a function from an R script. We basically need a means of defining the parameters below within an R script so that they can be read in within the RMD.

save your RMD we will be calling it directly in the next section

# input features 
baseDir <- "D:/R_SC_Spatial/intermediateGeospatialR/"
county <- "Bexar"
months <- c("april", "may", "june", "july")
filters <- c(2,6,10)